Aspects of pseudorank estimation methods based on the eigenvalues of principal component analysis of random matrices
نویسنده
چکیده
Nowadays, analytical instruments that produce a data matrix for one chemical sample enjoy a widespread popularity. However, for a successful analysis of these data an accurate estimate of the pseudorank of the matrix is often a crucial prerequisite. A large number of methods for estimating the pseudorank are based on the eigenvalues obtained from principal component analysis (PCA) . In this paper methods are discussed that exploit the essential similarity between the residuals of PCA of the test data matrix and the elements of a random matrix. In the literature of PCA these methods are commonly denoted as parallel analysis. Attention is paid to several aspects that have to be considered when applying such methods. For some of these aspects asymptotic results can be found in the statistical literature. In this study Monte Carlo simulations are used to investigate the practical implications of these theoretical results. It is shown that for sufficiently large matrices the distribution of the measurement error does not significantly influence the results. Down to a very small signal-to-noise ratio the ratio of the number of rows and the number of columns constitutes the major influence on the expected value of the eigenvalues associated with the residuals. The consequences are illustrated for two functions of the eigenvalues, i.e. the logarithm of the eigenvalues and Malinowski’s reduced eigenvalues. Both methods are graphical and have been applied in the past with considerable success for a variety of data. Malinowski’s reduced eigenvalues are of special interest since they have been used to construct an F-test. Finally, a modification is proposed for pseudorank estimation methods that are based on the principle of parallel analysis.
منابع مشابه
APPLICATION OF THE RANDOM MATRIX THEORY ON THE CROSS-CORRELATION OF STOCK PRICES
The analysis of cross-correlations is extensively applied for understanding of interconnections in stock markets. Variety of methods are used in order to search stock cross-correlations including the Random Matrix Theory (RMT), the Principal Component Analysis (PCA) and the Hierachical Structures. In this work, we analyze cross-crrelations between price fluctuations of 20 company stocks...
متن کاملAspects of pseudorank estimation methods based on an estimate of the size of the measurement error
The estimation of the pseudorank of a matrix, i.e., the rank of a matrix in the absence of measurement error, is a major problem in multivariate data analysis. In the practice of analytical chemistry it is often even the only problem. An important example is the determination, of the purity of a chromatographic peak. In this paper we discuss three pseudorank estimation methods that make use of ...
متن کاملGeneral practitioners\' views on key factors affecting their desired income: A principal component analysis approach
Background: Based on the target income hypothesis, the economic behavior of physicians is mainly affected by their target income. This study aimed at designing an instrument to explain how general practitioners (GPs) set their desired income. Methods: A self-administered questionnaire of affecting factors on GPs' target income was extracted from literature reviews and a small qual...
متن کاملFeature reduction of hyperspectral images: Discriminant analysis and the first principal component
When the number of training samples is limited, feature reduction plays an important role in classification of hyperspectral images. In this paper, we propose a supervised feature extraction method based on discriminant analysis (DA) which uses the first principal component (PC1) to weight the scatter matrices. The proposed method, called DA-PC1, copes with the small sample size problem and has...
متن کاملSpectrum estimation for large dimensional covariance matrices using random matrix theory
Estimating the eigenvalues of a population covariance matrix from a sample covariance matrix is a problem of fundamental importance in multivariate statistics; the eigenvalues of covariance matrices play a key role in many widely techniques, in particular in Principal Component Analysis (PCA). In many modern data analysis problems, statisticians are faced with large datasets where the sample si...
متن کامل